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Abstract — Discrete input memoryless channels (DIMC) are 
considered. The symmetric capacity of a DIMC is the maximum 
achievable rate when a uniform channel input distribution is 
used. The existence and construction of codes (namely polar 
codes) that achieve the symmetric capacity was recently proven. 
To achieve the true capacity, a non-uniform mapping from M 
sockets to the channel input alphabet can be used. Coding is 
then over the uniformly distributed sockets and the channel 
input distribution resembles the capacity-achieving distribution. 
In this work, the construction of optimal non-uniform mappings 
is investigated. An efficient algorithm to find optimal mappings 
is proposed and the rate by which a target distribution is 
approached is investigated. The results are then applied to 
non-uniform mappings for AWGN channels with finite signal 
constellations. The mappings found by the proposed methods 
outperform those obtained via the central limit theorem approach 
as suggested in the literature. 

I. Introduction 

The capacity of a discrete input memoryless channel 
(DIMC) is given by the maximum mutual information between 
channel input and channel output, where the maximum is 
taken over all possible input probability mass functions (pmf). 
Unequal transition probabilities between input and output 
symbols, input power constraints, or input symbols of unequal 
durations can lead to non-uniform capacity-achieving input 
pmfs pi. For a digital communication system to operate close 
to capacity, the pmf of the channel input symbols should 
therefore resemble the capacity-achieving pmf. Techniques to 
achieve this go under the name probabilistic shaping. Recently, 
§a§oglu et al showed in [2| the existence and construction 
of error correcting codes, called polar codes, that achieve for 
arbitrary DIMCs the symmetric capacity, i.e., the maximum 
rate achievable by uniform input pmfs. This raises the question 
how these codes can be used to achieve the true capacity. 

One possibility to solve this problem is by wrapping the 
channel by a super-channel that allows for uniform input. 
Gallager proposed in |3 p. 208] to use a non-uniform mapping 
from M sockets to the channel input alphabet to realize such a 
super-channel. An example of such a mapping is displayed in 
Fig. [T] This mapping transforms a uniform distribution over 
4 sockets into the non-uniform pmf d\ = 1/4, d,2 = 3/4, 
d,3 = 0. In the context of polar codes, the non-uniform 
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Fig. 1. 



mapping approach is shortly discussed in [2, Sec. III.D]. The 
purpose of the non-uniform mapping is to make the generated 
pmf resemble the capacity-achieving pmf. If this can only be 
achieved by a very large M, it would not be practical, since 
error correction coding has to be done over the M sockets and 
therefore, the coding complexity increases with M, see (4j. 
This motivates our work on how these non-uniform mappings 
should be constructed and on how to assess their performance. 

Assuming a uniform distribution over M sockets, each 
mapping generates an Af-type pmf, i.e., a pmf where each 
symbol probability can be written as c/M for some non- 
negative integer c. The other way around, there exists for 
each Af-type pmf d a mapping that generates it. Note that the 
mapping is in general many-to-one and not necessarily onto. 
The mapping in Fig. [T]is an example. We will in the following 
focus on the construction of A/-type pmfs; the corresponding 
mapping is immediate. 

We ask the following two questions: 
Ql When we increase M, how fast can an M-type pmf 

converge to the target pmf? 
Q2 For a finite M, how can we find the Af-type pmf that 

optimally approximates the target pmf? 
In |4j Sec. IV.B], Abbe and Barron consider question Ql 
for the AWGN channel. For M — 2 rn , they suggest to use 
the binomial coefficients divided by M as probabilities for 
an (m + 1)-PAM constellation. They call their method the 
central limit theorem (CLT) approach and they show that the 
gap to AWGN capacity 0.51og(l +snr) scales as l/log(Af). 
Schreckenbach proposed in (5) a greedy algorithm to construct 
an M -type pmf based on a target pmf, however, the author 
addresses none of the questions Ql and Q2. 

In this work, we use as a measure for how good d 
approximates the target pmf t the relative entropy D(d||<). The 
reason is that this measure is an upper bound for the loss of 
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Fig. 2. An illustrating example for quantization as defined in j5J. First, 
the left-open interval (0, 1] is partitioned into M = 4 left-open uniform 
intervals of length X/M. Second, the interval (0, 1] is partitioned into left- 
open intervals whose lengths are the probabilities of the target pmf. Finally, 
the approximation of t% determined given by the number of uniform intervals 
whose middle points lie within the interval that corresponds to tj. Thus, the 
quantization of t T = (0.09 0.69 0.22) T is d T = |(1 2 1) T . 



mutual information when a pmf d different from the capacity 
achieving pmf t is used [1]. Regarding question Ql, we show 
that the relative entropy decreases as 1/M. For question Q2, 
we propose an efficient algorithm that finds the M-type pmf 
that minimizes B(d||t). The complexity of our algorithm is 
O(Mn) where n is the number of entries of the target pmf t. 

The results are then applied to non-uniform mappings 
for AWGN channels with finite signal constellations. The 
mappings found by the proposed methods outperform those 
obtained via the CLT approach as suggested in HI Sec. IV.B]. 

The remainder of this paper is organized as follows. In 
Sec. [n] we state the considered problem. We then derive a 
convergence rate bound in Sec. Ill Next, we derive in Sec. [TV 
an algorithm to find optimal M-type approximations. In SecTV 
and Sec. |VI| we apply our methods to the AWGN channel and 
provide numerical results. 



II. Problem Statement 

A. Quantization 

We start by formally defining quantization. For notational 
convenience, we introduce for the target pmf t the correspond- 
ing cumulative distribution function (cdf) T. Its entries are 
given by 



Tj = }Jk, i = l,...,i 



(1) 



k=l 



Then, the z-th entry of the M-type approximation by quantiz- 
ing t is given by 



d i = ^{i:T i _ 1 <^<T i 



(2) 



where we define Tq = 0. An illustrating example is displayed 
in Fig. [2] Note that if t t = 0, then = Tj, which implies 
that the set on the right hand side is empty. Consequently, we 
have the following implication 



U =0 => di = 0. 



(3) 



We make use of this implication later. For each i, the value 
of di is bounded by 



t i -±<d i <t i + ± 



which implies 



\U-dA < 



1 

M' 



(4) 



(5) 



From this observation, we immediately get the following 
proposition. 

Proposition 1. Denote by f a function from the set of pmfs 
with n entries to the set of real numbers. Denote by t a target 
pmf and assume that f is continuous in t. Then t can be 
approximated arbitrarily well by an M-type pmf in the sense 
that for any e > 0, there is an M , such that for all M > M 
we have \f(t) — f(du)\ < e where du is the M-type pmf 
found by quantizing t. 

This property is very general and applies to any continuous 
function defined on the probability simplex. In particular, it 
applies to information measures such as entropy and mutual 
information as a function of the channel input pmf. 

B. Minimizing Relative Entropy 

Proposition [TJ is a qualitative result. It tells us that we can 
approximate a target pmf as close as we want, but not how 
fast when M increases and not how to do it optimally for a 
finite M. To get such results, we have to be more specific 
about our measure of approximation. A measure of interest 
would be the gap to capacity that results from using an A/-type 
pmf instead of the capacity-achieving pmf. In |4, Sec. IV.B], 
the authors derived a bound on the rate by which AWGN 
capacity can be approached using Af-type pmfs. However, the 
derivation depends heavily on the Gaussianity of the noise. 
Getting similar results for general DIMCs is difficult. The 
relative entropy B(d||i) of the actually employed channel 
input pmf d and the capacity-achieving pmf t is an upper- 
bound on the gap to capacity that results from using d (TJ 
Sec. 3.4.3]. Relative entropy is simpler to analyze since the 
(possibly complicated) structure of the channel only enters 
via the capacity-achieving pmf. We will therefore use relative 
entropy as a measure for approximation and address question 
Ql (rate of convergence) and question Q2 (optimal A/-type 
pmf) with respect to relative entropy. 

III. Convergence Rate 

In this section, we derive a bound on how fast the relative 
entropy approaches zero as M grows. 

The relative entropy achieved by the A/-type pmf d obtained 
by quantizing t is bounded as 



D(d||t)=J>log£ 
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where we used BJ in (|8). We apply the inequality 
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(6) 
(7) 

(8) 
(9) 



log(l + x) < x to the last line and get 

B(d\\t)< J2 ^ lo s( 1 + 77r)< E *i7r ( 10 ) 
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Thus we have shown the following. 

Proposition 2. For a target pmf t, denote by du the M-type 
pmf found by quantizing t. For M — > oo, D(dM||£) oc jr. 

IV. Optimal M-type pmf 

In this section, we consider the problem of finding optimal 
M-type pmfs. This can formally be stated as follows. Given 
is a target pmf t with n entries and a number M. We aim at 
solving the optimization problem 



minimize D(d||i) 

d 



subject to d is M-type 
A. Equivalent Problem 

We can write the objective function of problem ( fl2] i as 



(12) 



ti 



M 



(13) 
(14) 



-y c ,iog-+y — bg— (is) 



M 



= "TT c i log — — log M. 

M ^ U 6 



(16) 



We conclude that an optimization problem equivalent to prob- 
lem ([12]) is 

E c * lo sr 



minimize 

Cl,...,C n 



subject to Ci G {0, 1, 2, . . . }, i = 1, . 

J2 c * = M - 



(17) 



From the solution c* of the latter problem, the solution of the 
original problem ( fT~2] > is directly given by d* = c* ■ 1/M. 
A vector c that fulfills the constraints of problem ( fTTj i will 
be called an allocation in the following. To evaluate the 
complexity of problem ( (IT) , let's look at the number of distinct 
allocations. To do this, line up M identical items in a row. 
Then partition the row into n (potentially empty) segments by 
putting n — 1 separators. For each separator, there are M + 1 
possible positions: to the left of the row, to the right of the 
row, or in-between items. By assigning to the zth entry of 
a vector the number of items in the ith segment, a partition 
defines an allocation. Thus, the number of distinct allocations 
is (M + l) n_1 . This number increases polynomial in M and 
exponentially in n. Exhaustive search is therefore infeasible for 
practical purposes. We next develop an algorithm that finds the 
optimal vector in O(Mn) steps, i.e., with a complexity that 
is both linear in M and linear in n. 



B. Algorithm 

To see how we can efficiently solve problem \YJ) , we write 
the objective function as 



n n Ci 



fclog ■ 



(fc-l)log- 



1=1 



i=l fc=l 



EE- 



(18) 



A(*0- d9) 

i=i k=i 

An allocation c can be obtained by initially assigning the all 
zero vector to c and then successively incrementing the 
entries of c. After M steps, the constraint J^. c$ = M is 
fulfilled and c is a valid allocation. If in some step, the jth 
entry is incremented by 1, then the corresponding increment of 
the objective function is Aj(cj + 1). The following algorithm 
finds an allocation in a greedy manner. In each step, it in- 
creases by 1 the entry i with the smallest increment A^c^ + l). 

Algorithm 1. 

Initialize c,- •<— 0, i = 1, . . . , n. 
repeat M times 

Choose j — argmin Ai(c; + 1). 

i 

Update cj <— Cj + 1. 
end repeat 
Return c. 

The following proposition states the optimality of the Algo- 
rithm. 

Proposition 3. For a given target pmf t and a positive integer 
number M, the allocation found by Algorithm [7] minimizes 
'^2 i Ci log ^ over all M-type pmfs d. 

The proof is given in the next subsection. 

C. Proof of Proposition [i] 

Before we can give the proof, we need the following two 
lemmas. 

Lemma 1. For each i, if k > £ then Ai(k) > Ai(£), i.e., the 
increment functions are strictly monotonically increasing. 

Proof: To show this, we interpret the increment function 
Ai as defined on the set of real numbers greater than 1 and 
calculate its first derivative. We find 

Ai(x) = log > (20) 



dx 



1 



and conclude that Ai(k) is strictly monotonically increasing 
in k. ■ 

Lemma 2. Denote by c* an optimal allocation. Denote by c a 
pre-allocation with ^\ Ci < M and Ci < c* for i = 1, . . . , n. 
Then for j = argmi^ Ai(c-i + 1) and some optimal allocation 
c, it holds that 



1 < Ci 



(21) 



Proof: Suppose 

c, i + 1 > c* . (22) 

Since by assumption Cj < c*, this implies 

cj + 1 = c* + 1. (23) 

Since Cj < M and £\ c* = M, there has to be at least 
one I 7^ j with c| > q and equivalently, 

cj > c/ + 1. (24) 

Now, by decreasing c* t by one and increasing c* by one, the 
change of the objective function is Aj(c* + 1) — A^(c|). We 
can bound this change as follows. 

A 3 (c* + 1) - A/(cJ) < A,(c* + 1) - A,(c, + 1) (25) 
= Ajicj + l)- A e (cj, + 1) (26) 
< (27) 

where the first inequality follows by Lemma [T] and ( f2"4"| ), 
equality in the second line follows by ( |23] l, and the second 
inequality follows by the definition of j. We have to consider 
two cases. First, assume strict inequality in either |25) or 
( p7) i. Then the objective function is decreased, which is a 
contradiction to the assumption that c* is optimal. Thus, the 
supposition (j22j is wrong and the statement of the lemma 
holds for c = c*. Second, assume equality both in |25) and 
pT| . In this case, the objective function remains unchanged. 
Consequently, the newly constructed allocation obtained by 
decreasing c* t by one and increasing c* by one is also optimal. 
Denote it by c. By ( f23) and ( f24] ), c fulfills the statement of 
the lemma. This concludes the proof. ■ 
By Lemma [2] there is an optimal allocation c such that 
in each step of Algorithm [T] Cj < Cj, i = l,...,n. After 
termination, we have 

M = Y d C i <Y,h = M, (28) 

i i 

This can only be true if Cj = 2j for all i = 1, ...,n. 
Consequently, the constructed allocation c is optimal. This 
concludes the proof of Proposition [3] 

D. Complexity 

In each step, Algorithm [T] needs to find the minimum of 
a vector with n elements, the complexity of this is 0(n). 
This is done M times, so the overall complexity is (D(nM). 
We are aware of that the complexity could be further reduced 
to O(Mlogn) by keeping the list of increments Aj(cj + 1) 
sorted, however, the presented algorithm is simple to imple- 
ment and it was fast enough for our numerical calculations. 

E. Summary 

Note that since the sub-optimal A/-type pmfs obtained by 
quantization achieve a convergence rate of 1/M, this rate is 
also achieved by optimal M-type pmfs. We summarize the 
properties found for M-type approximations of a target pmf t 
in the following proposition. 



Proposition 4. Denote by du the pmf that minimizes B(d||t) 
over all M-type pmfs. Then 

. For M -> oo, D(d M \\t) oc i. 

. liniM^oo D(d M \\t) = 0. 

• Algorithm^finds djj with a complexity of O(Mn). 
V. Overview: Approaching AWGN Capacity 

We now consider the problem of approaching AWGN capac- 
ity with polar codes. In this section, we briefly review existing 
results and in the next section, we propose a new scheme based 
on our results on M-type pmfs. Consider an AWGN channel 
with noise N ~ A/"(0, 1). The AWGN capacity is given by 

C(snr) := max l(X; X^/snr + N) = - log(l + snr). 

E(X 2 )<1 '2 

(29) 

To be able to use polar coding for the AWGN channel, we need 
a discrete interface with 2 m points HI Sec. IVA]. We model 
this interface by an auxiliary random vector Z m with m binary 
entries Z{ that are independent and uniformly distributed. 
Consequently, Z m is uniformly distributed over 

Z m = {0---0,0---01,...,l---l}. (30) 

m bits 

Consider a discrete set X n of \X n \ = n real valued signal 
points and a deterministic mapping 

>-> (31) 

The constellation X n and the mapping g are subject to the 
constraint 

ng(z m ) 2 } < 1. (32) 

Define the gap to capacity as 

D m (snr, X n ,g) := C(snr) - l[g(Z m )] g{Z m ) y/snr + N]. 

(33) 

Of interest is now how the capacity gap scales with the number 
of bits m at the uniform interface. Two special cases are of 
interest: First, when n = 2 m and the mapping g is one-to-one. 
In this case, the signal point pmf is uniform and optimization 
is only over the signal point positions X n . This approach is 
called geometric shaping. Second, the signal point positions 
X n are restricted to be equidistant with distance A. In this 
case, optimization is over the distance A, the number of 
signal points n, and the mapping g. This approach is called 
probabilistic shaping. 

A. Previous result: geometric shaping 

Abbe and Barron show in HJ Sec. IV.C] the existence of a 
family X n such that for n — 2 m and g being one-to-one (we 
indicate this by writing g^), the gap to capacity scales as 

D m (sM,X 2m ,g ld )<x2- m (34) 

i.e., there exist signal point constellations A^™ such that the 
gap to capacity decreases exponentially in the number of bits 
m at the uniform interface when the mapping g is one-to-one. 
Note that the constellations A^™ that achieve this behavior are 
not equidistant. 



B. Previous result: probabilistic shaping 

Abbe and Telatar propose in (6] Sec. V] to use m + 1 
equidistant signal points and binomial coefficients normalized 
by 2 m as an 2 m -type distribution over these points. They call 
this scheme the CLT approach. We denote the equidistant 
signal points by £ m +i and the mapping defined by the 
binomial coefficients by g c \ t . Abbe and Barron show in (4] 
Sec. IV.B] that 

D m (snr,£ m+ i,3 c it) oc mT 1 (35) 

i.e., with the CLT approach, the capacity gap scales as mT 1 
in the number of bits at the uniform interface. Comparing |34]i 
and ( [35] l, we see that geometric shaping outperforms the CLT 
approach. This motivates us to investigate if it is possible to 
do probabilistic shaping by non-uniform mapping in a way 
that improves upon the CLT approach. We will give a positive 
answer to this question in the next section. 

VI. Improved Non-Uniform Mapping for AWGN 

In this section, we propose an alternative to the CLT 
approach. The key observation is that the CLT approach does 
not allow to separately optimize over the constellation size 
and the input pmf. In fact, for a given m, the CLT approach 
provides m + 1 constellation points and a fixed pmf over 
these points independent of the snr. While for m —> oo, this 
approach achieves capacity for any value of the snr, intuitively, 
this should be sub-optimal in general for finite values of m. 
This can be seen as follows. Fix m. For high enough snr, 
we expect among all 2 m -type pmfs the uniform pmf over 2 m 
points to be optimal. However, the CLT approach limits the 
number of constellation points to m+1. We therefore propose 
to maximize both over the cardinality of the constellation and 
the pmf. Note that there is a tradeoff between the constellation 
size and the pmf resolution. If we have n constellation points, 
we have in the average a resolution of 2 m jn for the probability 
of each constellation point. 

A. Our Approach 

We first state our approach in form of an algorithm and then 
give details for each step. 

Algorithm 2. 

for A; = 2, ... , 2 m 

1. :— k points: equidistant, normalized, centered. 

2. solve 

maximize l{X\/snr; Y) 

subject to X ~ p, X G AX {k \ E(\AX\ 2 ) < 1. 

Denote optimal pmf by p*w. 

3. d,W := 2 m -type pmf that minimizes D(d||p*). 
end for 

4. Choose n = argmin 

k 



ad 2. In the fcth step, we calculate the capacity-achieving 
pmf of a constellation that consists of k equidistant points. 
The optimization is both over the distance A of the points 
and over the input pmf. The optimization over A is done 
by line search and for each A, the optimization over p is a 
convex optimization problem. We let A take a finite number of 
equally spaced values, and for each value, we solve the convex 
optimization problem by using CVX (7J. We then choose p* as 
the optimal pmf for the value of A that results in the greatest 
mutual information. 

ad 3. For the optimal pmf p* that we found in 2., we use 
Algorithm [T] to find the pmf that minimizes D(d||p*) over all 
2 m -type pmfs d. Note that by (l] Prop. 5.10], (T] Prop. 3.11], 
and Pinsker's inequality [8, Theorem 1.5], if D(d||p*) — > 
then the mutual information and the average power achieved 
by d converge respectively to the mutual information and the 
average power achieved by p* . To avoid unfair comparison, we 
guarantee that the power constraint is fulfilled with equality by 
d by rescaling the constellation appropriately, i.e., we calculate 
the distance by 

E d(fc) (|AAf) = l AW= ±= =. (36) 

V E dw(\X\ 2 ) 

ad 4. For each constellation size 2, . . . , 2 m , the algorithm 
calculates an 2 m -type pmf. Choose the one that yields the 
greatest mutual information. 

B. Numerical Results 

We apply Algorithm [2] for signal-to-noise ratios of OdB and 
5dB, i.e., snr takes the values 1 and s» 3.16, respectively. We 
let m take the values 1,2,3,4,5,6. Fig [3] (a) and (c) show 
the results for OdB and Fig [3] (b) and (d) show the results 
for 5dB. We only discuss the results for OdB, the results for 
5dB are similar. For each value of m, we display in Fig. [3] 
(a) by a blue circle the results for the CLT approach. The 
horizontal coordinate represents the position of a signal point 
and the vertical coordinate its probability. The black points 
connected by a line represent the target pmf p*w and the 
red cross represents its 2 m -type approximation d' n ) as chosen 
by Algorithm [2] in line 4. As can be seen, for m = 1,2,3, 
Algorithm [2] recovers the 2 m -type pmf obtained via the CLT 
approach. For m = 4, 5, 6, the 2 m -type pmfs chosen by 
Algorithm [2] differ from the CLT pmfs. It is important to note 
that Algorithm [2] chooses a different number of signal points 
than the CLT approach. In Fig. [3] (c) the gap to capacity in 
nats is displayed. The blue line indicates the gap achieved 
by the CLT approach. The curve appears logarithmic in the 
logarithmic scale, which is consistent with the behavior 1/m 
as predicted by ( |3"5j ). The black connected points indicate the 
gap that the target pmfs would achieve. Note that the gap is 
not monotonically decreasing in m. The reason for this is that 
Algorithm [2] chooses in 4. the target pmf p*(") according to 
the gap that is achieved by its 2 m -type approximation d^ n \ 
not according to the gap that the target pmf would achieve by 
itself. The gap achieved by the 2 m -type approximation of the 
target pmfs is displayed by connected red crosses. Note that 
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Fig. 3. Comparison of CLT approach with optimal non-uniform mapping as proposed in this work. 



this gap actually decreases monotonically with m. As expected 
from Fig. [3] (a), the gaps achieved by CLT and Algorithm [2] 
are identical for m = 1,2,3. For m = 4,5,6, our approach 
outperforms the CLT approach. Note that this smaller gap is 
achieved by using a different number of signal points than the 
CLT approach suggests. This shows that our idea of optimizing 
both over the probabilities and the number of signal points is 
beneficial. 

C. Conclusions 

The numerical results suggest to look beyond the CLT 
approach and search for new analytical bounds for the gap 
that can be achieved by probabilistic shaping, i.e., equidistant 
constellations with non-uniform mappings. It may be possible 
that the scaling of geometric shaping ( [34| i can also be achieved 
by probabilistic shaping. This would be an interesting property, 
since geometrically shaped constellations need quantizers at 
the receiver of much higher precision than equidistant con- 
stellations do. This makes the probabilistic shaping approach 
attractive for practical systems. 
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